Extraction de motifs séquentiels dans les flux de données. (Sequential patterns mining from data streams)

نویسنده

Alice Marascu

چکیده

In recent years, many applications dealing with data generated continuously and at high speeds have emerged. These data are now quali ed as data streams. Dealing with potentially in nite quantities of data imposes constraints that raise many processing problems. As an example of such constraints we have the inability to block the data stream as well as the need to produce results in real time. Nevertheless, many application areas (such as bank transactions, Web usage, network monitoring, etc.) have attracted a lot of interest in both industry and academia. These potentially in nite quantities of data prohibit any hope of complete storage ; we need, however, to be able to examine the history of the data streams. This led to the compromise of "summaries" of the data stream and "approximate" results. Today, a huge number of di erent types of data stream summaries have been proposed. However, continuous developments in technology and in corresponding applications demand a similar progress of summary and analysis methods. Moreover, sequential pattern extraction is still little studied : when this thesis began, there were no methods for extracting sequential patterns from data streams. Motivated by this context, we are interested in a method that summarizes the data stream in an e cient and reliable way and that has as main purpose the extraction of sequential patterns. In this thesis, we propose the CLUSO (Clustering, Summarizing and Outlier detection) approach. CLUSO allows us to obtain clusters from a stream of sequences of itemsets, to compute and maintain histories of these clusters and to detect outliers. The contributions detailed in this report concern : Clustering sequences of itemsets in data streams. To the best of our knowledge, it is the rst work in this domain. Summarizing data streams by way of sequential pattern extraction. Summaries given by CLUSO consist of aligned sequential patterns representing clusters associated to their history in the stream. The set of such patterns is a reliable summary of the stream at time t. Managing the history of these patterns is a crucial point in stream analysis. With CLUSO we introduce a new way of managing time granularity in order to optimize this history. Outlier detection. This detection, when related to data streams, must be fast and reliable. More precisely, stream constraints forbid requiring parameters or adjustments from the end-user (ignored outliers or their late detection can be detrimental). Outlier detection in CLUSO is automated and self-adjusting. We also present a case study on real data, written in collaboration with Orange Labs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction de motifs séquentiels dans les flots de données d'usage du Web

Résumé. Ces dernières années, de nouvelles contraintes sont apparues pour les techniques de fouille de données. Ces contraintes sont typiques d’un nouveau genre de données : les “data streams”. Dans un processus de fouille appliqué sur un data stream, l’utilisation de la mémoire est limitée, de nouveaux éléments sont générés en permanence et doivent être traités le plus rapidement possible, auc...

متن کامل

Extraction De Motifs Séquentiels Dans Des Données Multidimensionelles. (Mining Sequential Patterns In Multidimensional Data)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

SPAMS: Une nouvelle approche incrémentale pour l'extraction de motifs séquentiels fréquents dans les data streams

Résumé. L’extraction de motifs séquentiels fréquents dans les data streams est un enjeu important traité par la communauté des chercheurs en fouille de données. Plus encore que pour les bases de données, de nombreuses contraintes supplémentaires sont à considérer de par la nature intrinsèque des streams. Dans cet article, nous proposons un nouvel algorithme en une passe : SPAMS, basé sur la con...

متن کامل

Préservation de la vie privée. Recherche de motifs séquentiels dans des bases de données distribuées

Extracting knowledge without disclosing any individual or sensitive information is a new challenging problem for the data mining community. In this paper, we present a new algorithm PRIPSEP (privacy preserving sequential patterns) for the mining of sequential patterns from distributed databases while preserving privacy. We prove that our architecture and protocols employed by our algorithm are ...

متن کامل

Une approche centroïde pour la classification de séquences dans les data streams

In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In a data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered as fast as possible, no blocking operator can be performed and the data can be examined only once. At this time ...

متن کامل

Extraction de motifs séquentiels. Problèmes et méthodes

SYNOPSIS. Dans un premier temps, le problème de l’extraction de motifs séquentiels peut sembler proche de celui de l’extraction de règles d’association. Ce rapprochement s’avère cependant très fragile en raison d’un élément clé qui est propre à l’extraction de motifs séquentiels : la temporalité. Cette notion permet à la fois de distinguer à l’intérieur des enregistrements un ordre d’apparition...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Extraction de motifs séquentiels dans les flux de données. (Sequential patterns mining from data streams)

نویسنده

چکیده

منابع مشابه

Extraction de motifs séquentiels dans les flots de données d'usage du Web

Extraction De Motifs Séquentiels Dans Des Données Multidimensionelles. (Mining Sequential Patterns In Multidimensional Data)

SPAMS: Une nouvelle approche incrémentale pour l'extraction de motifs séquentiels fréquents dans les data streams

Préservation de la vie privée. Recherche de motifs séquentiels dans des bases de données distribuées

Une approche centroïde pour la classification de séquences dans les data streams

Extraction de motifs séquentiels. Problèmes et méthodes

عنوان ژورنال:

اشتراک گذاری